Gradual Training Method for Denoising Auto Encoders
نویسندگان
چکیده
Stacked denoising auto encoders (DAEs) are well known to learn useful deep representations, which can be used to improve supervised training by initializing a deep network. We investigate a training scheme of a deep DAE, where DAE layers are gradually added and keep adapting as additional layers are added. We show that in the regime of mid-sized datasets, this gradual training provides a small but consistent improvement over stacked training in both reconstruction quality and classification error over stacked training on MNIST and CIFAR datasets. 1 GRADUAL TRAINING OF DENOISING AUTOENCODERS We test here gradual training of deep denoising auto encoders, training the network layer-by-layer, but lower layers keep adapting throughout training. To allow lower layers to adapt continuously, noise is injected at the input level. This training procedure differs from stack-training of auto encoders (Vincent et al., 2010) More specifically, in gradual training, the first layer of the deep DAE is trained as in stacked training, producing a layer of weights w1. Then, when adding the second layer autoencoder, its weights w2 are tuned jointly with the already-trained weights w1. Given a training sample x, we generate a noisy version x̃, feed it to the 2-layered DAE, and compute the activation at the subsequent layers h1 = Sigmoid(w > 1 x), h2 = Sigmoid(w > 2 h1) and y = Sigmoid(w ′> 3 h2). Importantly, the loss function is computed over the input x, and is used to update all the weights including w1. Similarly, if a 3rd layer is trained, it involves tuning w1 and w2 in addition to w3 and w′ 4. 2 EXPERIMENTAL PROCEDURES We compare the performance of gradual and stacked training in two learning setups: an unsupervised denoising task, and a supervised classification task initialized using the weights learned in an unsupervised way. Evaluations were made on three benchmarks: MNIST, CIFAR-10 and CIFAR100, but only show here MNIST results due to space constraints. We used a test subset of 10,000 samples and several sizes of training-set all maintaining the uniform distribution over classes. Hyper parameters were selected using a second level of cross validation, including the learning rate, SGD batch size, momentum and weight decay. In the supervised experiments, training was ’early stopped’ after 35 epochs without improvement. The results reported below are averages over 3 train-validation splits. Since gradual training involves updating lower layers, every presentation of a sample involves more weight updates than in a single-layered DAE. To compare stacked and gradual training on a common ground, we limited gradual training to use the same budget of weight update steps as stacked training. For example, when training the second layer for n epochs in gradual training, we allocate 2n training epochs for stacked training (details in the full paper). 1 ar X iv :1 50 4. 02 90 2v 1 [ cs .L G ] 1 1 A pr 2 01 5 Accepted as a workshop contribution at ICLR 2015 a) Unsupervised Training b) Supervised training 0 0.25 0.5 10.4 10.5 10.6 10.7
منابع مشابه
Gradual training of deep denoising auto encoders
Stacked denoising auto encoders (DAEs) are well known to learn useful deep representations, which can be used to improve supervised training by initializing a deep network. We investigate a training scheme of a deep DAE, where DAE layers are gradually added and keep adapting as additional layers are added. We show that in the regime of mid-sized datasets, this gradual training provides a small ...
متن کاملLearning Discrete Representations via Information Maximizing Self-Augmented Training
Our method is related to denoising auto-encoders (Vincent et al., 2008). Auto-encoders maximize a lower bound of mutual information (Cover & Thomas, 2012) between inputs and their hidden representations (Vincent et al., 2008), while the denoising mechanism regularizes the auto-encoders to be locally invariant. However, such a regularization does not necessarily impose the invariance on the hidd...
متن کاملLearning invariant features through local space contraction
We present in this paper a novel approach for training deterministic auto-encoders. We show that by adding a well chosen penalty term to the classical reconstruction cost function, we can achieve results that equal or surpass those attained by other regularized auto-encoders as well as denoising auto-encoders on a range of datasets. This penalty term corresponds to the Frobenius norm of the Jac...
متن کاملMarginalized Denoising Auto-encoders for Nonlinear Representations
Denoising auto-encoders (DAEs) have been successfully used to learn new representations for a wide range of machine learning tasks. During training, DAEs make many passes over the training dataset and reconstruct it from partial corruption generated from a pre-specified corrupting distribution. This process learns robust representation, though at the expense of requiring many training epochs, i...
متن کاملTraining Auto-encoders Effectively via Eliminating Task-irrelevant Input Variables
Auto-encoders are often used as building blocks of deep network classifier to learn feature extractors, but task-irrelevant information in the input data may lead to bad extractors and result in poor generalization performance of the network. In this paper,via dropping the task-irrelevant input variables the performance of auto-encoders can be obviously improved .Specifically, an importance-bas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1504.02902 شماره
صفحات -
تاریخ انتشار 2015